Support for NVML counters: GPU energy, instant GPU power, avg memory power by mpatrou · Pull Request #525 · icl-utk-edu/papi

mpatrou · 2025-12-19T20:35:32Z

Pull Request Description

Sibling PR (1/2) split into #529

The code

adds support for 3 NVML energy counters:
- energy_consumption
- gpu_inst_power
- gpu_memory_avg_power

Author Checklist

Description
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits
Commits are self contained and only do one thing
Commits have a header of the form: module: short description
Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
Tests
The PR needs to pass all the tests

Treece-Burgess · 2025-12-29T18:14:32Z

@mpatrou Thank you for posting this PR, I will take a look through it.

Treece-Burgess · 2025-12-31T15:48:18Z

src/components/nvml/linux-nvml.c

 nvmlReturn_t DECLDIR nvmlDeviceGetMemoryInfo(nvmlDevice_t, nvmlMemory_t *);
 nvmlReturn_t DECLDIR nvmlDeviceGetPerformanceState(nvmlDevice_t, nvmlPstates_t *);
 nvmlReturn_t DECLDIR nvmlDeviceGetPowerUsage(nvmlDevice_t, unsigned int *);
+nvmlReturn_t DECLDIR nvmlDeviceGetTotalEnergyConsumption(nvmlDevice_t, unsigned long long *);


As a note from nvml.h, this function is supported from Volta and up.

However, I tested on Voltar at Oregon with a P100 and it didn't seem to have an issue.

src/components/nvml/linux-nvml.c

src/components/nvml/linux-nvml.h

src/components/nvml/linux-nvml.c

Treece-Burgess · 2025-12-31T17:34:03Z

src/high-level/papi_hl.c

         verbose_fprintf(stdout, "PAPI-HL Info: The event \"%s\" will be stored as instantaneous value.\n", requested_event_names[i]);
      }

+      // except from nvml energy_consumption delta


Q: An above comment mentions that all nvml events will be instantaneous values. Why for energy_consumption and gpu_inst_power would this need to be changed to delta and the new event_type average respectively?

As when I use your updated papi_hl.c the value of energy_consumption is zero. Which for the master branch does not occur and I get a non-zero value.

Regarding the first question.

Energy consumption is a cumulative counter. When we use papi_hl API to regions of code, we would need to measure the counter before and after the code and report the difference as the energy consumed to execute that region of code; thus the delta calculation.

The gpu_inst_power coule be benefit from an average calculation for a region, instead of reporting one value at the end of the region to give a more detailed information.

P.S. Thank you so much for your very detailed review! I am going though your comments and making changes locally!!

For the second comment: Did you try to profile some code that runs over 1ms or so? to give enough time to get 2 different energy values before and after the code execution.

For the second comment: Did you try to profile some code that runs over 1ms or so? to give enough time to get 2 different energy values before and after the code execution.

I added a sleep of 10 seconds and did end up getting output for it. Thank you for pointing that out.

Regarding the first question.

Energy consumption is a cumulative counter. When we use papi_hl API to regions of code, we would need to measure the counter before and after the code and report the difference as the energy consumed to execute that region of code; thus the delta calculation. The gpu_inst_power coule be benefit from an average calculation for a region, instead of reporting one value at the end of the region to give a more detailed information.

I understand, a current workaround that I thought of was the following workflow:

PAPI_hl_region_begin() PAPI_hl_read() // Get your initial energy // Kernel launch/Work PAPI_hl_region_end()

The issue I see with this option, is you would then have to manually or programmatically grab the appropriate values and then take the difference.

Could you split this PR into two (I also can if you are currently busy with other work just let me know)? This PR can contain the updates to the NVML component and then the new PR would be updates just for papi_hl.c.

The approach we have on the PR, helps to profile the energy information of applications with kokkos and kokkos-tools papi connector: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/papi-connector, with no change on the kokkos side. (as a side-note, for the reason we took that route)

Sure, no problem. I can split the PR . I'll move the updates for the papi_hl.c to another one.

Treece-Burgess · 2025-12-31T17:38:18Z

src/high-level/papi_hl.c

-         /* get cycles for last component */
-         retval = PAPI_read_ts( _local_components[i].EventSet, _local_components[i].values, &_local_cycles );
-      }
+      retval = PAPI_read( _local_components[i].EventSet, _local_components[i].values);


Q: In the PR description you mentioned that "internal_hl_read_counters that wasn't reading the counters for all components in the machines we tested". What exactly was your setup for this i.e. PAPI configure, system you were on, exported PAPI_EVENTS, etc.

I used the master papi_hl.c with this branch for the three new nvml events you added and all three show with values in the .json I end up creating.

src/components/nvml/linux-nvml.c

Treece-Burgess

Final testing was done on Methane at ICL (1 * A100) and Athena at Oregon (4 * A100).
In both testing cases ./configure --prefix=$PWD/test-install --with-components="nvml" --with-debug=yes was utilized.

Methane at ICL (1 * A100)

With Cuda Toolkit 12.9:

papi_component_avail - ✅ (count was updated from 27 in the master branch to 30 in this branch)
papi_native_avail - ✅ (showed the three new nvml native events added in this branch)
papi_command_line - ✅ (all three of the new nvml native events were able to be added and collect counter values for)
HelloWorld.cu - ✅ (for all three of the new nvml native events the test ran successfully)

Athena at Oregon (4 * A100)

Testing on this machine was only for compilation purposes to make sure that the #if defined's worked properly. To do this I used Cuda Toolkit 12.0 as NVML_POWER_SCOPE_GPU and NVML_POWER_SCOPE_MEMORY do not exist. Results are:

PAPI build: ✅
papi_native_avail - ✅ (it did not show the events for gpu_memory_avg_power and gpu_inst_power as expected)
papi_command_line - ✅ (ran successfully with total_energy_consumption)

…ded, papi_hl delta and average calculations added, issue with components fixed

Treece-Burgess · 2026-01-09T15:32:02Z

@mpatrou Thank you for originally submitting the PR and all the changes you made! I did see you created PR #529 and will take a look at that soon!

Treece-Burgess self-requested a review December 29, 2025 18:14

Treece-Burgess requested changes Dec 31, 2025

View reviewed changes

Treece-Burgess reviewed Jan 6, 2026

View reviewed changes

src/components/nvml/linux-nvml.c Outdated Show resolved Hide resolved

mpatrou mentioned this pull request Jan 6, 2026

papi_hl region-based calculations #529

Open

3 tasks

mpatrou changed the title ~~Support for NVML counters: GPU energy, instant GPU power, avg memory power, and papi_hl region-based calculations updates~~ Support for NVML counters: GPU energy, instant GPU power, avg memory power Jan 6, 2026

Treece-Burgess approved these changes Jan 9, 2026

View reviewed changes

mpatrou added 5 commits January 9, 2026 02:46

gpu energy, instant gpu and avg memory power nvml counters support ad…

0600787

…ded, papi_hl delta and average calculations added, issue with components fixed

nvml energy updates

7bd1815

restore papi_hl changes to split PR

c7ef602

space restored

536c71e

check if macros are defined

4b26585

Treece-Burgess force-pushed the energy_counters branch from 5ca3633 to 4b26585 Compare January 9, 2026 02:46

Treece-Burgess merged commit c827d2c into icl-utk-edu:master Jan 9, 2026
4 checks passed

Conversation

mpatrou commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Author Checklist

Uh oh!

Treece-Burgess commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Treece-Burgess left a comment

Choose a reason for hiding this comment

Methane at ICL (1 * A100)

Athena at Oregon (4 * A100)

Uh oh!

Uh oh!

Treece-Burgess commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mpatrou commented Dec 19, 2025 •

edited

Loading